Learning to Classify Data Streams with Imbalanced Class Distributions
نویسندگان
چکیده
Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. In this paper, we consider the issue of high class imbalacne in conjunction with data streams. We propose a method called Boundary Definition, which relies on building the classifiers by stressing on the boundary cases as the streams arrive. We employ a sequential validation framework, which we believe is the most meaningful option in the context of streaming imbalanced data.
منابع مشابه
Cost Sensitive Online Multiple Kernel Classification
Learning from data streams has been an important open research problem in the era of big data analytics. This paper investigates supervised machine learning techniques for mining data streams with application to online anomaly detection. Unlike conventional machine learning tasks, machine learning from data streams for online anomaly detection has several challenges: (i) data arriving sequentia...
متن کاملCost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams
Mining streaming and drifting data is among the most popular contemporary applications of machine learning methods. Due to the potentially unbounded number of instances arriving rapidly, evolving concepts and limitations imposed on utilized computational resources, there is a need to develop efficient and adaptive algorithms that can handle such problems. These learning difficulties can be furt...
متن کاملData mining with imbalanced class distributions: concepts and methods
Some real world data mining applications present imbalanced or skewed class distributions. In these domains, the underrepresented classes are often the ones we are more interested in. However, most learning algorithms are not able to induce meaningful classifiers in some imbalanced domains. One reason for this poor performance is that learning algorithms tend to focus in abundant classes to max...
متن کاملGraph Classification with Imbalanced Class Distributions and Noise
Recent years have witnessed an increasing number of applications involving data with structural dependency and graph representations. For these applications, it is very common that their class distribution is imbalanced with minority samples being only a small portion of the population. Such imbalanced class distributions impose significant challenges to the learning algorithms. This problem is...
متن کاملOnline Class Imbalance Learning and its Applications in Fault Detection
Although class imbalance learning and online learning have been extensively studied in the literature separately, online class imbalance learning that considers the challenges of both ̄elds has not drawn much attention. It deals with data streams having very skewed class distributions, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. T...
متن کامل